Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform

Colorectal cancer (CRC) exhibits a significant death rate that consistently impacts human lives worldwide. Histopathological examination is the standard method for CRC diagnosis. However, it is complicated, time-consuming, and subjective. Computer-aided diagnostic (CAD) systems using digital pathology can help pathologists diagnose CRC faster and more accurately than manual histopathology examinations. Deep learning algorithms especially convolutional neural networks (CNNs) are advocated for diagnosis of CRC. Nevertheless, most previous CAD systems obtained features from one CNN, these features are of huge dimension. Also, they relied on spatial information only to achieve classification. In this paper, a CAD system is proposed called “Color-CADx” for CRC recognition. Different CNNs namely ResNet50, DenseNet201, and AlexNet are used for end-to-end classification at different training–testing ratios. Moreover, features are extracted from these CNNs and reduced using discrete cosine transform (DCT). DCT is also utilized to acquire spectral representation. Afterward, it is used to further select a reduced set of deep features. Furthermore, DCT coefficients obtained in the previous step are concatenated and the analysis of variance (ANOVA) feature selection approach is applied to choose significant features. Finally, machine learning classifiers are employed for CRC classification. Two publicly available datasets were investigated which are the NCT-CRC-HE-100 K dataset and the Kather_texture_2016_image_tiles dataset. The highest achieved accuracy reached 99.3% for the NCT-CRC-HE-100 K dataset and 96.8% for the Kather_texture_2016_image_tiles dataset. DCT and ANOVA have successfully lowered feature dimensionality thus reducing complexity. Color-CADx has demonstrated efficacy in terms of accuracy, as its performance surpasses that of the most recent advancements.


Related works
The need for fast and objective analysis of histopathology slides initiated the use of digital pathology systems.Work includes three main categories which are detection, segmentation, and classification.This paper is concerned with classification.Previous work introduces two tracks, one is based on extracting hand-crafted features and feeding them to a classifier to obtain the result, and the other is based on DL methods.

Handcrafted feature extraction-based methods
In the handcrafted feature extraction track, six types of texture descriptors to classify eight types of CRC were used by the authors in 8 .The texture descriptors are Local binary patterns (LBP), lower-order and higher-order histogram features, Gabor filters, Gray-level co-occurrence matrix (GLCM), Perception-like features, and Combined feature sets.The authors used a new dataset of 5,000 histological images of human CRC.The accuracy of multiclass CRC classification reached 87.4%.Furthermore, in 11 , various machine learning algorithms were used to classify CRC.Features were extracted from 3D images of three different color spaces which are RGB, HSV, and L*A*B colors spaces using GLCM.The authors used a training dataset of 3504 images and a testing dataset of 1496 images.They used five common machine learning algorithms, which are Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), Classification Decision Tree (CDT) and Quadratic Discriminant Analysis (QDA).The proposed methodology showed that it can detect CRC.This achieved accuracy resulted from combining texture features from all color space channels.QDA using RGB provided the best performance rate for the used machine learning models which was greater than 97% for the training and testing sets.These ML-based methods for recognizing CRC have four primary constraints.Initially, the process of extracting and selecting appropriate features is laborious, as it relies on a trial-and-error approach.Also, they are prone to error.Furthermore, previous studies have employed a diverse range of classifiers that possess numerous parameters.Choosing an efficient classifier is a difficult undertaking 6 .Finally, they usually produce low classification performance.

Deep learning-based methods
Currently, DL-based systems are employed to autonomously extract superior features from the data being provided.It is an effective instrument for identifying a range of health issues.A CNN is a prevalent DL architecture.Several CNN models have been investigated for the identification and categorization of CRC.For example, Inception-v3 CNN architecture was used in the study 36 .The work in this study is based on WSIs which were taken from several hospitals and sources where China contributed 8554 patients, the United States provided 1077 patients, and Germany provided more than 111 slides.The average accuracy and AUC reached 98.06% (95% confidence interval [CI] (97.36-98.75%)and 98.83% (95% CI 98. 15-99.51).Whereas, the authors of reference 37 proposed a two-step procedure for CRC classification.In the first step, AlexNet which is a pre-trained deep CNN architecture is used for feature extraction.Then multiple machine learning classifiers are used to perform classification.The proposed method reaches an average accuracy of about 99.44% for the binary and multiclass image datasets of histopathology cancer images.The paper 6 introduced a CNN called CRCCN-Net, which was designed to classify multi-class colorectal tissue histopathological images for the purpose of CRC classification.Four pre-existing CNNs, namely CRCCN-Net, Xception, InceptionResNetV2, DenseNet121, and VGG16, were individually trained on the NCT-CRC-HE-100K and colorectal histology datasets, respectively.Additionally, the two datasets were combined and subsequently utilized to train pre-existing models and CRCCN-Net to classify multi-class CRC.The suggested network achieved an accuracy of 93.50% on the colorectal histology dataset and 96.26% on the NCT-CRC-HE-100K dataset.With the combined dataset, the novel network achieved 99.21% accuracy.
The ResNet architecture, along with an attention module, was employed in 26 to produce extensive feature maps to classify various tissues in HIs.Furthermore, neighborhood component analysis (NCA) effectively addresses the limitation of computational complexity.The selected features were inputted into an SVM classifier to train the model.The hybrid procedure was validated and tested using the CRC-5000 and NCT-CRC-HE-100 K datasets.The hybrid algorithm attains accuracy rates of 98.75% and 99.76% on CRC datasets.The study 38 studies different hyperparameters tuning of VGG19 to classify CRC using WSI reaching an accuracy of 96.4%. on the other hand, the authors of 39 employed a generative adversarial network (GAN) to generate synthetic data and used Inception for CRC classification reaching an accuracy of 89.54%.Whereas, the study 40 introduced a dualresolution DL-based framework, known as WDRNet, which stands for weakly supervised learning.Annotation was initially mitigated through the utilization of a CNN trained using weakly supervised multi-instance learning.Furthermore, a dual-stream network design was implemented to acquire comprehensive information at a scale of 5 and specific details at a scale of 20.The WDRNet model demonstrated a high level of accuracy in identifying tumour images, achieving an accuracy rate of 0.977 at the slide level and 0.953 at the patch level.
ResNet-18 and ResNet-50 were trained on colon gland images in 41 .The models were trained to classify CRC into two classes which are benign and malignant.The prototypes were tested on three varieties of testing data (20%, 25%, and 40% of whole datasets).ResNet-50 proved to provide the most reliable performance for accuracy, sensitivity, and specificity over ResNet-18 for the three kinds of testing data.The best performance value on 20% and 25% test sets achieved a classification accuracy of above 80%, a sensitivity of above 87%, and a specificity of above 83%.for the three test assortments.The authors in 42 selected an optimizer and modified the parameters of the CNN models which improved the classification accuracy as they suggested.The well-trained DL methods were compared on two different histological image open datasets; the first comprised 5000 H&E images of CRC and the second was NCT-HE-100K data set which is composed of images comprising nine organizational categories with an external validation of 7180 images.The accuracy was close to 99% in an internal testing set and 94.3% in an external testing set.ResNet50 in this study resulted in an accuracy rate of 99.69% on the same internal testing set and 99.32% on the same external testing set which outperformed the data of VGG19.Moreover, ResNet50 achieved 94.86% accuracy for the eight classes of the Kather-texture-2016-image-5000 for comparison purposes.
The authors of reference 43 introduced a self-supervised Deep Adaptive Regularised Clustering (DARC) framework for pre-training a neural network.DARC uses an iterative process to group the acquired representations and then uses these group assignments as pseudo-labels to train the network's parameters.The authors created an objective function that combines a network loss with a clustering loss using an adaptive regularisation function to improve the discriminative quality of representations.This function is updated dynamically during training to enhance the learning of feasible representations.On the other hand, the paper 44 introduced a refined deep-learning model based on VGG16 for classifying image-level textures based on the CRC dataset.To reduce overfitting and significantly improve classification accuracy, it is essential to fine-tune the model, particularly when the training dataset is limited, thus the VGG-16 pre-trained model was fine-tuned.The study 45 created a novel approach that integrates transfer learning and a ResNet50 CNN model to enhance the accuracy of classifying histopathology images of CRC.The experimental results showed exceptional performance with a training accuracy of 99.99% and a validation accuracy of 99.77%, achieving excellent results.
The study 46 proposed an attention training mechanism embedded in a CNN for multiclass CRC classification.The NCT-CRC-100K dataset was utilized to validate the effectiveness of the suggested methodology, resulting in a classification accuracy of 99.77%.In 47 , the author introduced a DL method which is based on unsupervised feature extraction where a sub-region of a tissue image is quantized.A deep belief network of consecutive (RBMs) was used where the extracted sub-regions pixels are fed to and the activation values of the hidden units in the last RBM layer are defined as the deep features of this subregion.These deep features are then clustered to learn the quantization in an unsupervised way.A Nikon Coolscope Digital Microscope with a 20 × objective lens is used to collect the dataset giving an image resolution of 480 × 640.Images in this dataset are categorized into three classes: normal, low-grade cancer, and high-grade cancer.The dataset has 3236 images which are taken from 258 patients.The dataset is randomly divided into two groups to provide the training and testing sets.The training has 1644 images taken from 129 patients which were classified as 510 normal cases, 859 low-grade cancer cases, and 275 high-grade cancer cases.On the other hand, the remaining patients which comprise the test set have 1592 images divided into 491 normal cases, 844 low-grade cancer cases, and 257 high-grade cancer cases.The average accuracy reached 96%.
Similarly, the authors of the study 48 introduced a novel attention mechanism called MCCBAM, which combines channel attention and spatial attention mechanisms.A framework named HCCANet was created using CNN and MCCBAM.The study utilized 630 histopathology images that underwent denoising with Gaussian filtering.Grad-CAM was employed to enhance the comprehensibility of HCCANet by visualizing regions of interest.The experimental findings demonstrate that the HCCANet model surpasses four cutting-edge DL models.In the study 49  Prior research indicated that the majority of past studies depended on a singular CNN design.Even studies that utilized multiple CNN architectures employed each one separately for classification.Combining deep features from multiple CNNs with varying structures is typically preferred as it often improves classification accuracy.Most current CADs relied on deep features of high dimensions and did not use feature reduction to decrease their size, which would reduce the cost of classification.Moreover, the majority of CNNs in previous computeraided diagnoses depended on spatial information for detecting and classifying CRC.Integrating spatial and spectral information could enhance the efficiency of detection and classification processes.In addition, many current CAD systems only conduct binary classification of whole slide images (WSI) as either cancerous or noncancerous.Identifying the subtype of CRC is crucial for determining the appropriate treatment and monitoring strategies.Moreover, numerous current CAD systems have commonly employed a training dataset comprising tens to hundreds of whole slide images (WSIs) that have been carefully annotated by expert pathologists to detect areas of disease.Annotating whole slide images (WSIs) can be challenging and time-consuming because of their extensive size and the dispersed nature of tumour regions within the image, which are often intermingled with a substantial amount of non-cancerous areas.Developing deep learning models for health impact assessment has become difficult due to this.
This study suggests a CAD system called "Color-CADx" to classify various CRC subclasses, aiming to address the limitations mentioned earlier.Color-CADx employs three CNN models with distinct architectures instead of a single one.It also integrates deep features of these CNNs.The classification process relies not only on spatial information from images but also utilizes spectral information.The method utilizes the discrete cosine transform (DCT) with zigzag scanning to merge deep features from three CNNs and generate a spatial-spectral description.DCT is also utilized to decrease the large dimensions of the combined features.A feature selection approach is utilized to choose important features, thereby decreasing training complexity.Color-CADx classifies without the need for specifying disease regions or using segmentation procedures.

Datasets description
This research is applied to two datasets which are the NCT-CRC-HE-100 K dataset and the Kather_texture_2016_ image_tiles which are two publicly available datasets for CRC cancer classification.Details of the used datasets are given below.

NCT-CRC-HE-100 K dataset
The NCT-CRC-HE-100 K dataset publicly available in 50 is a collection of distinct picture patches taken from histological images of human colorectal cancer (CRC) and healthy tissue stained with hematoxylin and eosin (H&E).All photos are 224 × 224 pixels and have a pixel size of 0.5 microns (MPP).All photos are color-normalized using Macenko's technique.The magnificator factor of the images in the dataset is 20×.Adipose (ADI), backdrop (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), and colorectal adenocarcinoma epithelium (TUM) are the nine tissue types in the dataset.The National Center for Tumor Diseases in Heidelberg, Germany, and the UMM Pathology Archive provided the N = 86 H&E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples for the dataset (University Medical Center Mannheim, Mannheim, Germany).The distribution of images among CRC classes is shown in Table 1.Samples from the nine classes of the dataset are shown in Fig. 1.

Kather_texture_2016_image_tiles
The University Medical Centre Mannhem in Germany provides a publicly accessible dataset known as Kather texture 2016 8,51 .The digitalized CRC tissue slides comprise samples derived from both low-and high-grade main tumors.The magnificator factor of the photos included in this dataset is 20 ×. Figure 2 illustrates eight distinct textures observed in tumor specimens: (1) cancerous epithelium (TUMOUR), (2) stromal cells (STROMA), ( 3) Table 1.The distribution of images among CRC classes for the NCT-CRC-HE-100K dataset.

Class lable
No. images

Proposed color-CADx
In order to accomplish CRC classification, Color-CADx implements several steps including data preparation, deep learning models formation and training, feature extraction and reduction, feature fusion and selection, and classification.In the data preparation step, each photo's aspect is altered.Then, these images are split and augmented.Next, in the deep learning model formation and training, three pre-trained CNNs including AlexNet 52 , ResNet50 53 , and DenseNet201 54 are constructed and then retrained using the two CRC datasets.After that in the feature extraction and fusion, deep features are extracted from each CNN and then their dimension is reduced using DCT.Afterward, these features are fused using DCT, and their dimensions are further diminished using a Zigzag scanning algorithm.In parallel, reduced DCT coefficients obtained from the three CNNs are concatenated and then a feature selection approach is used to select significant features.Lastly, individual and ensemble classifiers are employed to classify CRC images.The summary of Color-CADx steps is given in Fig. 3.

Data preparation
Initially, the aspects of the CRC images of both datasets are changed to be equivalent to the input length of each CNN.For ResNet50 and DenseNet201 they are modified to 224 × 224 × 3, whereas for AlexNet they are changed to 227 × 227 × 3. Next, two different training-testing ratios are employed to split data and ensure that there is

Deep learning model formation and training
Triple DL models are implemented utilizing transfer learning (TL) including AlexNet, ResNet50, and DenseNet201.AlexNet, despite being one of the earliest structures, continues to be utilized because of its satisfactory performance.This is due to its efficient computational ability and its outstanding performance with colour images 55 , as the datasets pointed out in this paper.AlexNet possesses a high learning rate and training pace, facilitating the learning process 56 which is vital in medical applications requiring fast and precise diagnosis.It enhances network training efficiency without significantly increasing the workload and diminishes the reliance of gradients on the initial values and scale of parameters.The model's capacity to acquire hierarchical representations enables it to effectively capture complex patterns in medical images 56 .Besides, Integrating Local Response Normalisation (LRN) improves its ability to generalize, enabling it to detect minor differences and irregularities in medical data.On the other hand, ResNet is employed in this study as it is capable of convergence effectively with a reasonable computing expense even when the amount of layers is increased, unlike AlexNet CNN 57,58 .He et al. 58 introduced a novel structure based on deep residual learning.This structure incorporates residuals, known as cutoffs, within the layers of a conventional CNN to intersect certain convolution layers simultaneously.These residuals enhance the efficiency of CNN.Furthermore, these residuals expedite and enhance the convergence process of the CNN despite the extensive number of deep convolution layers 59 .Besides, DenseNet 201 is utilized in this work since it uses dense connections between layers to decrease the number of parameters, enhance information flow, and promote feature reuse.This enhanced parameter efficiency results in a faster and more trainable network 60 .TL is defined as the process of utilizing training data from a specific model to guide the development of a second model that is of a comparable nature.If the dataset in hand has an insufficient number of images and is used directly to train the CNN from scratch, it will not achieve good training performance.Thus, TL employs a CNN that has been pre-trained on a large dataset, like ImageNet, to perform a specific task.Subsequently, the pre-trained CNN model is applied to a novel dataset containing a reduced number of data samples, similar to the datasets utilized in our study 61 .In the medical domain, TL is frequently employed due to the scarcity of extensively annotated massive amounts of medical datasets comparable to the ImageNet database 62 .In this study, three pre-trained CNNs including AlexNet 52 , ResNet50 53 , and DenseNet201 54 that were previously trained on the ImageNet dataset are constructed.TL is utilized to modify the output layer of convolutional CNNs to have 9 and 8 nodes, respectively, matching the number of categories in the NCT-CRC-HE-100 K dataset and the Kather_texture_2016_image_tiles dataset.The augmented pictures generated in the prior phase are subsequently employed to retrain the pre-trained CNNs by modifying certain hyperparameters of the CNNs, such as minibatch size, learning rate, number of epochs, and validation rate.Further elaboration on the hyper-parameter finetuning will be provided in Sect."Experiments settings".

Feature extraction and reduction
Once the retraining procedure of each CNN is terminated, TL is applied once again to obtain deep features from each CNN.For ResNet50 and DenseNet-201, features are acquired from the final average pooling layer whereas for ALexNet, features are obtained from the fully connected layer called "F7".The dimensions of these features www.nature.com/scientificreports/are 2048, 1920, and 4096 for ResNet50, DenseNet-201, and AlexNet.These features are of huge dimensions, thus a feature reduction step is required to lower their size.Therefore, DCT is adopted to diminish the dimension of each feature set independently.The DCT is a widely employed linear transformation technique in the field of signal analysis and processing.It facilitates the decomposition of the input data into its various frequency components 51 .Upon analyzing the input values utilizing DCT a matrix of DCT coefficients is produced.Only a subset of the components is retained while the remaining components are disregarded.The main advantage of DCT is that it possesses energy compaction property, indicating that important data information is concentrated in the lower frequency band.This property can be utilized for reducing feature dimensions.Most important features remain unchanged, ensuring the data quality is still preserved.This is valuable in scientific applications where balancing dimension reduction and quality maintenance is essential 63 .The attribute reduction process involves a critical step known as the selection of the DCT components.Usually, a conventional method such as zigzag is used to select from the DCT variables 53 .Therefore, in this study, DCT with zigzag scanning is used to diminish the size of features and obtain spectral information.

Feature fusion and selection
In this step, deep features of the three CNNs are combined using DCT.An ablation study is conducted to select the optimal number of features after DCT based on zigzag scanning.Feature selection (FS) is an important process that identifies the most beneficial variables in a given variable space and reduces their overall size, resulting in improved performance [64][65][66] .FS has the ability to disregard redundant or unrelated features within the collection of available features.Additionally, it expedites the training process and reduces training complexity.Furthermore, FS serves to mitigate overfitting during all stages of model training.Therefore, another approach for FS is investigated in this research.First, the DCT features acquired from the three CNNS in the previous step are concatenated.After that, analysis of variance (ANOVA) FS 67 is performed to further reduce the dimension of the features thus lowering the complexity of the classification.ANOVA is a valid method for feature selection because it can detect significant differences between groups in a simple manner, identify important features for prediction rapidly, work with multiple classes and variables, assist in sensitivity analysis, prevent overfitting, remain computationally efficient, and offer results that are easy to interpret 68 .Another ablation study is conducted to show the accuracy versus features selected using ANOVA.
Balancing feature dimensionality reduction and retaining essential information is an essential issue in machine learning, especially for classification tasks.Dimensionality reduction techniques are used to decrease the number of features in a dataset to tackle problems such as computational complexity, and overfitting.Nevertheless, this decrease must be carefully monitored to guarantee that crucial information is preserved for precise classification.Excess reduction in dimensionality could result in the elimination of essential features that have discriminatory information, causing problems with the model's capacity to correctly categorize samples.Conversely, keeping an excessive number of features may add interference and result in the model becoming excessively intricate, thus impeding its ability to generalize to unfamiliar data.Choosing the correct balance requires the selection of suitable dimensionality reduction methods depending on the data's attributes.Parameter tuning is necessary to regulate the level of reduction and preserve the most pertinent information.It is crucial to use robust evaluation metrics to analyze the effect of dimensionality reduction on classification accuracy and find the best balance between simplifying the model and retaining important information.Achieving this equilibrium is essential for developing models that are computationally effective, generalize effectively to new data, and uphold outstanding precision in classification assignments 69,70 .Therefore, in this study ablation studies are conducted to examine the trade-off between the number of features and classification accuracy.

Classification
In order to perform the classification procedure of the CRC, the Color-CADx uses an individual classifier including Cubic SVM, and an ensemble classifier involving ensemble discriminate analysis (ESD).Color-CADx achieves the classification process through four experiments Experiment I investigates end-to-end classification using three CNNs: AlexNet, ResNet50, and DenseNet201.The purpose of this investigation is to address the issue of overfitting by utilizing two different training-testing ratios.The ratios for training and testing are 70-30 and 60-40, respectively.In the subsequent Experiment (II), deep features are extracted from every CNN and subsequently inputted into two shallow classifiers: the ESD and Cubic-SVM.Experiment III assesses the application of DCT for the purpose of reducing features.In Experiment IV, the features from various networks are combined using DCT, and then the zigzag scanning approach is applied to fused features.Then, selected features are fed to the same shallow classifiers.At the same time, the DCT features attained from the three CNNs in Experiment III are concatenated, and then ANOVA FS is adopted to select significant features, which are then used to feed the shallow classifiers.

Validation metrics
The results of the proposed CAD framework are validated using several statistical validation metrics including F1-score, precision, accuracy, specificity (true positive rate (TPR)), and sensitivity (recall).Equations 1, 2, 3, 4 and 5 are used to compute these measures 12,18  where the total sum of CRC images that are well classified to the CRC class which they actually belong to is known as TP, TN is the sum of CRC images that do not belong to the CRC class intended to be classified, and truly do not belong to it.For each class of CRC, FP is the sum of all images that are classified as this CRC class but they do not truly belong to.For each class of CRC, FN is the entire sum of CRC images that are not classified as this CRC class.

Hyperparameters finetuning
The hyper-parameters used are the minibatch size which is the amount of data included in each sub-epoch weight change and is chosen to be 10.Using small batch sizes usually achieves the best generalization performance.The learning rate determines the step size at each iteration while moving towards a minimum of a loss function.In my experiments, the learning rate was chosen to be 0.0001.The maximum number of epochs was chosen to be 20 as increasing the number of epochs did not improve the performance.The three used networks are trained with stochastic gradient descent with momentum techniques as this improves the rate of convergence and avoids getting trapped in a local minimum during convergence.Two distinct training-testing ratios are used to divide the data to prevent overfitting.The chosen training-testing ratios are 70-30 and 60-40.These splitting ratios were selected as they have been commonly used in the literature [71][72][73][74][75] .

Results
This research investigates several models for CRC classification which are evaluated using several experiments.Experiment I investigates end-to-end classification using three CNNs which are AlexNet, ResNet50, and DenseNet201 with two training-testing ratios to overcome overfitting.The training-testing ratios are 70-30 and 60-40.In Experiment II, features are extracted from each network and passed to two the SVM and ESD classifiers.Experiment III evaluates the use of the DCT for feature reduction.Next, in Experiment IV, features from different networks are fused using DCT, and zigzag scanning is used to select features and fed to the SVM and ESD classifiers.in addition, DCT features attained from each CNN are concatenated and then ANOVA FS is applied to select a significant number of lower features.This section will illustrate the results attained in each experiment.

Experiment I results
In this section, the results of the end-to-end classification of the AlexNet, DenseNet201, and ResNet50 are shown.The accuracy results are given in Tables 3 and 4 for 70-30 and 60-40 ratios respectively for the Kather_ texture_2016_image_tiles and the NCT-CRC-HE-100K datasets.As can be noted from Table 3

Experiment III results
This experiment presents the power of the DCT for feature reduction 76 .Features extracted from each CNN are fed into the DCT.The results obtained using the NCT-CRC-HE-100K and Kather_texture_2016_image_tiles datasets after reduction using DCT are shown in Tables 7 and 8 respectively.

Experiment IV results
In this experiment, all the extracted features from the three CNNs at the two selected training-testing ratios are fused using DCT.Different feature lengths are investigated to choose the optimum length that provides the best accuracy results.Note that, since Cubic SVM always attained higher performance than ESD in Experiments II and III for both datasets, it will be only used in Experiments IV and V.The results are given in charts provided in Figs. 4 and 5 for the Kather_texture_2016_image_tiles dataset and Figs.6 and 7 for the NCT-CRC-HE-100K dataset for 70-30 and 60-40 split ratios.As shown in Figs. 4 and 5, for both split ratios of the Kather_tex-ture_2016_image_tiles dataset.After fusing features using DCT, only 4000 and 5000 coefficients can provide Table 7. Accuracy results (%) after applying the DCT process on deep features of the three CNNs for 70-30 and 60-40 ratios for the NCT-CRC-HE-100K dataset.www.nature.com/scientificreports/ the peak accuracy of 96.8% and 97.0% for the 70-30 and 60-40 splits respectively which are higher than those accuracies attained in Experiment III (Table 8).These results verify the DCT is capable of fusing features while reducing their dimension.Besides, the spatial-spectral information is superior to using just spatial representation.The feature dimensionality results in a decrease of almost 50% in the feature vector which results in reduced computational complexity.Figures 6 and 7 illustrate the results obtained from the Kather_texture_2016_image_tiles dataset, considering both split ratios.By applying the Discrete Cosine Transform (DCT) to combine features, we found that only 4000 coefficients were necessary to achieve peak accuracy.Specifically, the accuracy reached 99.3% and 99.3% for the 70-30 and 60-40 splits, respectively.These accuracies are almost similar to that obtained in Experiment III (Table 7).These results confirm that the DCT can combine features while simultaneously decreasing their size.In addition, the spatial-spectral information surpasses the use of solely spatial representation.The reduction in feature dimensionality leads to a nearly 50% decrease in the size of the feature vector, resulting in a lower computational complexity.
Other performance metrics such as F1-score, precision, specificity, and sensitivity (recall) are calculated for the highest achieved accuracies in Experiment IV using the Cubic SVM and are given in Table 9 for the NCT-CRC-HE-100K dataset and the kather_texture_2016_image_tiles dataset.The mean and the standard deviation are calculated for the F1-score, precision, specificity, and sensitivity (recall) for all classes.Standard deviation is a measure of variation or dispersion between values in a set of data.The lower the standard deviation, the closer the data points tend to be to the mean (or expected value).On the other hand, a higher standard deviation indicates a wider range of values.The DenseNet201 always provided the best accuracies and its features are the ones used in Experiment IV.Also, the 70-30 training-testing ratio is the ratio used in Experiment IV as it attained higher performance than the 60-40 split ratio.Table 9 shows that the average precision, specificity, sensitivity, and F1-score using the 70-30 split ratio are 0.9672, 0.9952,0.9664,and 0.9667 for the kather_texture_2016_image_tiles dataset and 0.9924, 0.9990, 0.9923, and 0.9924 for the NCT-CRC-HE-100K dataset using 70-30 split ratio.Furthermore, the confusion matrices and receiving operating characteristics curve (ROC) for both datasets are determined and plotted in Figs. 8 and 9 respectively.Also, the area and ROC curve (AUC) is calculated.
Experiment IV also involves the concatenation of the DCT features acquired from the three CNNs, and then applying the ANOVA FS approach to select a reduced set of features.The results of the Cubic SVM classifier are shown in Table 10 for both datasets.Table 10 demonstrates that for the NCT-CRC-HE-100K dataset, the highest accuracy of 99.3% is achieved with 2000 features, whereas for the kather_texture_2016_image_tiles dataset, the maximum accuracy of 96.8% is obtained using 1000 features which is much lower than the 4000  for NCT-CRC-HE-100 K dataset).In addition, the AUC is computed.The confusion matrices in Fig. 10  However, as depicted in Fig. 9, the Area Under the Curve (AUC) for both datasets is either 1 or very close to 1.

Discussion
This study aims to assess the effectiveness of using ensemble CNNs and transfer to automatically detect colorectal cancer WSIs.Besides, it seeks to investigate the capacity of DCT as a feature reduction and fusion algorithm.Therefore, in this study, a CAD system called "Color-CADx" is designed to accurately classify various subclasses of colorectal cancer (CRC).Color-CADx employs three convolutional neural network (CNN) models with distinct architectures, as opposed to using a single model.It additionally combines the deep features of these CNNs.The classification process does not solely rely on the spatial information provided by images but also utilizes spectral demonstration.Thus, it utilizes the discrete cosine transform (DCT) with zigzag scanning to merge the deep features from the three CNNs and achieve a spatial-spectral representation.The DCT is employed to decrease the vast dimensions of the combined features.The classification process in Color-CADX is accomplished through four experiments.Experiment I examines the performance of three convolutional neural networks (CNNs)-AlexNet, ResNet50, and DenseNet201-in end-to-end classification.This investigation aims to tackle the problem of overfitting by employing two distinct training-testing ratios.The training and testing ratios are 70-30 and 60-40, respectively.In the following experiment (II), deep features are extracted from each CNN and fed into two shallow classifiers: the ESD and Cubic-SVM.Experiment III evaluates the utilization of DCT to decrease the number of features.The features obtained from multiple networks are aggregated utilizing DCT in Experiment IV, then followed by the zigzag scanning method to the merged features.Subsequently, the chosen features are inputted into the identical shallow classifiers.Furthermore, in Experiment IV, the DCT features obtained from the three CNNs are merged and then the ANOVA FS method is used to choose the most important features.These selected features are subsequently employed as input for the shallow classifiers.
Experiment II shows that extracting deep features using transfer learning is superior to end-to-end classification.This is obvious as the accuracy results obtained in Tables 5 and 6 (Experiment II) obtained with 4096, 2048, and 1920 features for ALexNet, ResNet50, and DenseNet201 are higher than those obtained in Tables 3  and 4 (Experiment I).In addition, Experiment III results prove that DCT was capable of decreasing the feature dimensionality to reach 1500, 1200, and 1000 features for ALexNet, ResNet50, and DenseNet201 with higher accuracy for AlexNet and ResNet and slightly lower accuracy in the case of DenseNet.On the other hand, the accuracies (99.3% and 96.8% for the NCT-CRC-HE-100 K and the kather_texture_2016_image_tiles datasets respectively) achieved in Experiment IV, when DCT was used to fuse features of the three CNNs, and then zigzag scanning was applied to select features verify that DCT is capable of enhancing the performance except for DenseNet which accomplished almost the same accuracy.Besides, the results also indicate that when concatenating DCT features attained from the three CNNs and applying ANOVA FS, the accuracy accomplished is 99.3% and 96.8% for the NCT-CRC-HE-100 K and the kather_texture_2016_image_tiles datasets with 2000 and 1000 features respectively.Features were then extracted using transfer learning from each network and passed to shallow classifiers for evaluation in Experiment II.DCT was then applied to the extracted features for feature reduction in Experiment III.Later, in Experiment IV, features from different classifiers were fused using DCT, and zigzag scanning was used to select features thus lowering the feature vector.In addition, DCT features acquired from each CNN were concatenated and then ANOVA FS was applied to these features to pick up a reduced set of features.Experiment II demonstrated that employing transfer learning to extract deep features outperformed end-to-end classification.The superiority of the accuracy results in Tables 5 and 6 (Experiment II) for ALexNet, ResNet50, and DenseNet201, achieved with 4096, 2048, and 1920 features, is evident when compared to the results in Tables 3 and 4 (Experiment I).Furthermore, the findings from Experiment III demonstrated that DCT successfully reduced the number of features to 1500, 1200, and 1000 for ALexNet, ResNet50, and DenseNet201, respectively.Notably, this reduction in feature dimensionality resulted in improved accuracy for AlexNet and ResNet, while the accuracy for DenseNet slightly decreased.However, in Experiment IV, when DCT was used to combine features from three CNNs and zigzag scanning was used to select features, accuracies of 99.3% and 96.8% were achieved for the NCT-CRC-HE-100K and kather_texture_2016_image_tiles datasets, respectively.This showed that DCT can improve performance, except for DenseNet which reached similar accuracy.In addition, the findings also explained that by combining DCT features obtained from the three CNNs and utilizing ANOVA FS, the achieved accuracy was 99.3% and 96.8% for the NCT-CRC-HE-100K and kather_texture_2016_image_tiles datasets, respectively.The number of features used was 2000 for the former dataset and 1000 for the latter.Color-CADx has proven efficacy in correctly categorizing CRC histopathological images.Therefore, it can serve as a valuable method for aiding medical professionals and technicians in accurately determining the particular kind of tissues in this examination.Consequently, cancerous specimens are less prone to going unnoticed, resulting in patients receiving appropriate and timely treatments with greater frequency.
, the authors compared handcrafted feature extraction methods with deep learning-based approaches.Four CNN architectures were assessed: ResNet-101, ResNeXt-50, Inception-v3, and DenseNet-161.They also suggested two Ensemble CNN methods: Mean-Ensemble-CNN and Neural Network-Ensemble-CNN.The experimental results demonstrate that the suggested methods surpassed the hand-crafted feature-based techniques and CNN architectures.

Figure 3 .
Figure 3. Summary of the steps of Color-CADx.

Figure 8 .
Figure 8. Confusion matrics realized with Cubic SVM trained with 4000 DCT components obtained via the kather_texture_2016_image_tiles dataset and the NCT-CRC-HE-100K dataset using the 70-30 split ratio.

Figure 9 .
Figure 9. ROC curves realized with Cubic SVM trained with 4000 DCT components obtained via the kather_ texture_2016_image_tiles dataset and the NCT-CRC-HE-100K dataset using the 70-30 split ratio.

Figure 10 .
Figure 10.Confusion matrics realized with Cubic SVM trained with 1000 faetures obtained via the kather_ texture_2016_image_tiles dataset and 2000 features acquired from the NCT-CRC-HE-100K dataset using the 70-30 split ratio.

Figure 11 .NCT
Figure 11.ROC curves realized with Cubic SVM trained with 1000 features obtained via the kather_ texture_2016_image_tiles dataset and 2000 features acquired from the NCT-CRC-HE-100K dataset using the 70-30 split ratio.

Table 2 .
The distribution of images among CRC classes for the Kather texture 2016 dataset.

Table 3 .
Accuracy results for end-to-end classifications for the Kather_texture_2016_image_tiles.Table4, for the NCT-CRC-HE-100K, the ResNet50 and the DenseNet201 provided the best classification accuracies.

Table 6 .
Accuracy results (%) for the used CNNs features for 70-30 and 60-40 ratios for the kather_ texture_2016_image_tiles dataset.ESD.The DenseNet201 provided the best classification accuracies with a maximum accuracy of 97.1% for the Kather_texture_2016_image_tiles dataset and an average of 97% for the Kather_texture_2016_image_tiles dataset.The Cubic SVM classifier performed better than the ensemble classifier.

Table 8 .
Accuracy results (%) after applying the DCT process on deep features of the three CNNs for 70-30 and 60-40 ratios for the Kather_texture_2016_image_tiles dataset.

Table 9 .
F1-score, precision, specificity, and sensitivity achieved with Cubic SVM trained with 4000 DCT components obtained via the kather_texture_2016_image_tiles dataset and the NCT-CRC-HE-100K dataset using 70-30 split ratio.

Table 10 .
Accuracy results (%) versus the number of features obtained with ANOVA FS.

Table 11 .
F1-score, precision, specificity, and sensitivity achieved with Cubic SVM trained with 1000 and 2000 feat obtained via the kather_texture_2016_image_tiles dataset and the NCT-CRC-HE-100 K dataset using 70-30 split ratio.

Table 11
presents a comparison of the results achieved by the proposed framework with the current cutting-edge methods for classifying CRC tissue.The suggested approach demonstrates superior performance compared to the